Skip to content

fix: adapt SDK ProxyConfiguration to crawlee v4 API#596

Open
B4nan wants to merge 8 commits intofix/storage-client-v4-adaptfrom
fix/proxy-configuration-v4-adapt
Open

fix: adapt SDK ProxyConfiguration to crawlee v4 API#596
B4nan wants to merge 8 commits intofix/storage-client-v4-adaptfrom
fix/proxy-configuration-v4-adapt

Conversation

@B4nan
Copy link
Copy Markdown
Member

@B4nan B4nan commented Apr 30, 2026

Summary

Crawlee v4 reshaped ProxyConfiguration:

  • newProxyInfo / newUrl now take a single TieredProxyOptions argument; the (sessionId, options) pair is gone.
  • The protected _handleCustomUrl(sessionId) helper was removed.
  • _callNewUrlFunction / _handleTieredUrl take options only.
  • ProxyInfo (in @crawlee/types) no longer carries sessionId.

This PR adapts the SDK's override:

  • newProxyInfo and newUrl accept string | number | TieredProxyOptions | undefined — existing SDK callers that pass a raw sessionId keep working, and the override is also compatible with crawlee's v4 single-options signature. A small parseSessionIdOrOptions helper discriminates and pulls sessionId from options.request when no explicit one is given.
  • Inlined custom-URL session stickiness as getSessionIndex(sessionId) (replacing the removed _handleCustomUrl), keyed on the inherited usedProxyUrls map.
  • Re-declared sessionId?: string on the SDK's ProxyInfo interface so users can keep reading proxyInfo.sessionId.
  • ProxyInfo is now imported from @crawlee/types (no longer re-exported from @crawlee/core).
  • Tightened a .some(url => url.includes(...)) for the new (string | null)[] shape.

Stacking

Depends on #583 (config redesign). Rebases cleanly onto v4 once that lands.

B4nan added 8 commits April 30, 2026 21:14
Crawlee v4 reshaped `ProxyConfiguration`:
- `newProxyInfo` and `newUrl` now take a single `TieredProxyOptions`
  argument; the previous `(sessionId, options)` pair is gone.
- The protected `_handleCustomUrl(sessionId)` helper was removed; the
  `_callNewUrlFunction` and `_handleTieredUrl` helpers now take options
  only.
- `ProxyInfo` (in `@crawlee/types`) no longer carries `sessionId`.

Changes:
- `newProxyInfo` and `newUrl` accept `string | number |
  TieredProxyOptions | undefined` so existing SDK callers that pass a
  raw `sessionId` keep working, while the override remains compatible
  with crawlee's v4 signature. A small `parseSessionIdOrOptions`
  helper discriminates and pulls `sessionId` from `options.request`
  when no explicit one is given.
- Inlined custom-URL session stickiness via a new private
  `getSessionIndex(sessionId)` (replacing the removed
  `_handleCustomUrl`), keyed on `usedProxyUrls` like the base class.
- Re-declared `sessionId?: string` on the SDK's `ProxyInfo` interface
  so users can still read `proxyInfo.sessionId` (v3 carried it on the
  base type).
- Re-imported `ProxyInfo` from `@crawlee/types` (no longer re-exported
  from `@crawlee/core`).
- Tightened a `proxyUrls.some(url => url.includes(...))` access for
  the new `(string | null)[]` array shape.

Stacked on #583 (config redesign); rebases onto v4 once that lands.
- Custom URL rotation: post-increment the round-robin index so the
  first sessionless call returns proxyUrls[0] (was off-by-one).
- Surface `username` on the returned ProxyInfo by parsing it out of
  the resolved URL — v3 carried it via `super.newProxyInfo`.
- parseSessionIdOrOptions now rejects non-plain objects (e.g. Date,
  Array) so `newUrl(new Date())` throws as users expect.

test: `newUrl({})` is no longer 'invalid' — empty TieredProxyOptions
is a legal v4 call shape; documented the carve-out.
…oxyInfo shape

- newUrl/newProxyInfo accept an optional second `legacyOptions`
  argument so existing callers that pass `(sessionId, {request})`
  keep working under the v4 shape too.
- Returned ProxyInfo omits Apify-only fields (groups, countryCode)
  when not using Apify Proxy and only includes `proxyTier` when
  defined — matches v3's strict-deep-equal expectations.
…nfiguration tests

- ProxyInfo.username is now the decoded form (`user@name` rather
  than `user%40name`), matching v3 behaviour and the test
  expectations.
- Added a beforeEach to the `Actor.createProxyConfiguration()`
  describe that resets serviceLocator + Configuration.globalConfig +
  Actor._instance so each test sees the env vars it sets.
crawlee v4 (apify/crawlee#3599, beta.51) removed `tieredProxyUrls`,
`tieredProxyConfig`, `_handleTieredUrl`, and `proxyTier` from
`ProxyConfiguration` / `ProxyInfo`. The SDK's wrapper used to thread
those through to the base class; with the upstream API gone, that
plumbing has to go too.

- Remove the `tieredProxyConfig` field from the SDK's
  `ProxyConfigurationOptions`.
- Drop the constructor branch that forwarded `tieredProxyUrls` /
  `tieredProxyConfig` to the base class and the now-unreachable
  `_generateTieredProxyUrls` helper.
- Drop the `tieredProxyUrls` short-circuit and `proxyTier` field
  from `newUrl` / `newProxyInfo`.
- Drop the corresponding test groups in `proxy_configuration.test.ts`.
@B4nan B4nan force-pushed the fix/proxy-configuration-v4-adapt branch from b490925 to 4f718b5 Compare April 30, 2026 19:15
@B4nan B4nan changed the base branch from v4 to fix/storage-client-v4-adapt May 6, 2026 09:26
@barjin barjin self-requested a review May 7, 2026 06:43
Copy link
Copy Markdown
Member

@barjin barjin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @B4nan ! Here, I have more talking points.

Please see the comments below ⬇️

Comment on lines +674 to +676
// `tieredProxyUrls` / `tieredProxyConfig` were removed from
// crawlee v4 (apify/crawlee#3599); the corresponding test groups
// were dropped here too.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we do away with these "gravestone" comments?

These are useful when reading the AI output, but I don't think we should commit these. Future maintainers won't care about the tests that are not here anymore.

Comment on lines +323 to +327
sessionIdOrOptions?:
| string
| number
| Parameters<CoreProxyConfiguration['newProxyInfo']>[0],
legacyOptions?: Parameters<CoreProxyConfiguration['newProxyInfo']>[0],
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The removal of the sessionId parameter from Crawlee v4 was intentional, see this comment .

According to the new "UserPool" design, ProxyConfiguration shouldn't care for Session details (the resolved proxy URL is stored in a Session after it's retrieved and the ProxyConfiguration is not queried again).

Imo, each call to newProxyInfo in SDK should just return a random, valid proxy URL - that is, e.g., with random session IDs. This way, the SDK shields the user from the Apify Proxy session implementation.

proxyConfig.newProxyInfo() // { url: "http://session-131231@proxy.apify.com" }
proxyConfig.newProxyInfo() // { url: "http://session-234244@proxy.apify.com" }
proxyConfig.newProxyInfo() // { url: "http://session-342434@proxy.apify.com" }
// ...

As far as the consumer is concerned, these are just opaque URLs.

Comment on lines +440 to 450
private getSessionIndex(sessionId: string): number {
if (!this.usedProxyUrls.has(sessionId)) {
this.usedProxyUrls.set(
sessionId,
this.proxyUrls![
this.usedProxyUrls.size % this.proxyUrls!.length
],
);
}
return this.proxyUrls!.indexOf(this.usedProxyUrls.get(sessionId)!);
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason behind this? Perhaps we can remove this, given the returned urls should be independent?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants